Concept
speech processing
Parents
Children
Dialog SystemsEmotion RecognitionInformation RetrievalLanguage RecognitionMachine Translation
77.9K
Publications
4.2M
Citations
113K
Authors
10.5K
Institutions
Perception-Centered Speech Processing
1948 - 1974
Speech research from 1948 to 1974 framed spoken communication as a perceptually anchored coding problem, guiding analysis, synthesis, and evaluation of intelligibility through perceptual cues. Phonetic and formant analysis became the core toolkit, enabling automatic formant extraction, onset-time discrimination, and phoneme boundary detection within processing pipelines. Neural and brain-level studies linked speech processing to cognitive substrates and hemispheric specialization, while production–perception integration highlighted the interactive nature of articulation and linguistic structure. Early computational work framed automatic recognition and analysis-by-synthesis as foundational approaches for speech technology. Historical Significance: The period produced foundational articulatory–acoustic models connecting vocal tract shaping to acoustic output, underpinning formant transitions and coarticulation; impetus for later production research, speech synthesis, and coding. Foundational perception experiments clarified cue integration and the role of audition in recognition, influencing robust ASR and auditory scene analysis. The Perception of the Speech Code introduced invariant cues enabling phonetic decoding, shaping cognitive models and resilient recognition. Speech Analysis Synthesis and Perception demonstrated the perceptual validity of vocoder-like parameterization, foreshadowing advances in coding and synthesis. The Aeroacoustic general theory on sound generation provided a baseline for articulatory phonetics and early production models.
• Perception-centric analyses treated speech as a coded signal whose perceptual cues guide analysis, synthesis, and interpretation, shaping how researchers model speech coding, processing, and intelligibility across perception and linguistics foundations [1], [2], [5], [9], [12], [19].
• Phonetic and formant-oriented analysis emerged as the core methodological toolkit for revealing structure in voiced speech, driving automatic formant extraction, onset-time discrimination, and phoneme boundaries within signal processing pipelines [3], [7], [9], [12], [15], [16], [19].
• Neural and brain-level investigations linked speech processing to cognitive and neural substrates, emphasizing hemispheric asymmetry, neurolinguistics, and neural processing of noncanonical signals such as backwards speech [6], [8], [10].
• Production–perception integration emphasizes context, predictability, pauses, grammar use, and articulatory–phonetic relations, treating speech as an interactive system where production, perception, and linguistic structure shape each other [4], [13], [14], [18], [20].
• Early computational work framed automatic recognition and analysis-by-synthesis as the backbone of speech technology, yielding vowel recognition programs, phonetic-pattern detection, and spectral reduction for automated speech analysis [15], [17], [19].
Parametric-Probabilistic Speech Processing (Pre-HMM Era)
1975 - 1981
Parallel Interactive Speech Perception
1982 - 1988
Multiresolution Psychoacoustic Speech Processing
1989 - 1995
Neural-Cognitive Speech Integration
1996 - 2002
Auditory-Motor Speech Dynamics
2003 - 2009
End-to-End Neural Speech
2010 - 2016
End-to-End Speech Synthesis
2017 - 2023